[OpenTelemetry] Improve HistogramExplicitBounds#7165
Conversation
Apply recommendations from open-telemetry#3428: - Use SIMD vectorization to find buckets. - Use radix sort. - Apply cap to histogram explicit bounds (10M). Written primarily by Copilot through an iterative approach.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7165 +/- ##
==========================================
- Coverage 89.82% 89.75% -0.08%
==========================================
Files 276 276
Lines 14738 14822 +84
==========================================
+ Hits 13238 13303 +65
- Misses 1500 1519 +19
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Add missing patch coverage.
Fix CS8604 warning.
This comment was marked as outdated.
This comment was marked as outdated.
Add note for histogram boundary upper limit.
There was a problem hiding this comment.
Pull request overview
Improves histogram explicit-bound bucket lookup performance in the metrics pipeline, and adds a hard cap on the number of explicit boundaries to prevent pathological allocations/processing.
Changes:
- Replaces the previous large-bound lookup strategy with a radix-partitioned search + SIMD-accelerated linear scan fallback.
- Adds a 10,000,000 max explicit boundary count check for both view configuration and instrument advice.
- Adds unit tests, fuzz/property tests, and benchmarks; updates changelog and solution/project wiring.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/OpenTelemetry.Tests/Metrics/MetricViewTests.cs | Adds unit test validating boundary-count limit exception details. |
| test/OpenTelemetry.Tests/Metrics/AggregatorTests.cs | Adds coverage for large-bound lookup edge cases (mixed signs, infinities, NaN, degenerate NaN bounds) and display-bounds cleanup. |
| test/OpenTelemetry.FuzzTests/OpenTelemetry.FuzzTests.csproj | New fuzz test project for OpenTelemetry core metrics. |
| test/OpenTelemetry.FuzzTests/Metrics/HistogramExplicitBoundsFuzzTests.cs | Adds FsCheck property tests comparing optimized path vs scalar baseline. |
| test/Benchmarks/Metrics/HistogramExplicitBoundsBenchmarks.cs | Adds BenchmarkDotNet benchmarks for bucket lookup across sizes/layouts. |
| src/OpenTelemetry/OpenTelemetry.csproj | Grants InternalsVisibleTo for the new OpenTelemetry.FuzzTests project. |
| src/OpenTelemetry/Metrics/View/ExplicitBucketHistogramConfiguration.cs | Introduces MaxBoundaryCount and enforces it when setting Boundaries. |
| src/OpenTelemetry/Metrics/MetricStreamIdentity.cs | Enforces MaxBoundaryCount for instrument advice-provided boundaries. |
| src/OpenTelemetry/Metrics/HistogramExplicitBounds.cs | Implements radix-partitioned + SIMD-assisted bucket index lookup and infinity cleanup. |
| src/OpenTelemetry/CHANGELOG.md | Documents the explicit-boundary cap as a breaking change. |
| OpenTelemetry.slnx | Adds the new fuzz test project to the solution. |
Comments suppressed due to low confidence (1)
src/OpenTelemetry/Metrics/View/ExplicitBucketHistogramConfiguration.cs:74
IsSortedAndDistinctcurrently treatsdouble.NaNas valid because all comparisons with NaN return false, so arrays containing NaN can pass validation even though they are not meaningfully “ascending order with distinct values”. Consider explicitly rejecting NaN (and possibly non-finite values) in this check soBoundariesvalidation matches the documented requirements and downstream bucketing assumptions.
for (var i = 1; i < values.Length; i++)
{
if (values[i] <= values[i - 1])
{
return false;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix incorrect ArgumentException parameter name.
|
@cijothomas, could you please check this PR? Especially adding boundaries? |
There was a problem hiding this comment.
Some direct feedback while reviewing by Codex, in general LGTM.
It will be great to see Cijo final review, but I will not block this.
@martincostello, I think that you need to solve problems with MemoryMarshal, your preious changes are working ;)
Remove `MemoryMarshal` usage after merging with main to remove unsafe code.
|
Need to re-run the benchmarks with the latest changes. |
Benchmark comparison:
|
| Scenario (BoundCount / Layout) | Method | Baseline | Candidate | Δ |
|---|---|---|---|---|
| 49 / MixedSigned | LookupExactBoundary | 10.70 ns | 5.14 ns | −52% |
| 49 / MixedSigned | LookupInRangeValue | 10.89 ns | 5.03 ns | −54% |
| 49 / MixedSigned | LookupWithInfiniteBounds | 10.45 ns | 5.12 ns | −51% |
| 49 / MixedSigned | LookupPositiveInfinity | 20.06 ns | 0.54 ns | −97% |
| 49 / MixedSigned | LookupNaN | 15.67 ns | 0.05 ns | −100% |
| 49 / PositiveOnly | LookupExactBoundary | 10.41 ns | 4.98 ns | −52% |
| 49 / PositiveOnly | LookupInRangeValue | 10.34 ns | 4.85 ns | −53% |
| 49 / PositiveOnly | LookupPositiveInfinity | 21.02 ns | 0.46 ns | −98% |
| 49 / PositiveOnly | LookupNaN | 15.45 ns | ~0 ns | −100% |
| 10 / MixedSigned | LookupPositiveInfinity | 7.84 ns | 0.64 ns | −92% |
| 10 / MixedSigned | LookupNaN | 2.87 ns | 0.02 ns | −99% |
| 10 / MixedSigned | LookupExactBoundary | 2.47 ns | 2.18 ns | −12% |
| 10 / MixedSigned | LookupWithInfiniteBounds | 2.50 ns | 2.26 ns | −10% |
| 10 / PositiveOnly | LookupPositiveInfinity | 4.48 ns | 0.50 ns | −89% |
| 10 / PositiveOnly | LookupNaN | 3.04 ns | ~0 ns | −100% |
| 10 / PositiveOnly | LookupExactBoundary | 2.48 ns | 2.61 ns | +5% (noise) |
| 10 / PositiveOnly | LookupInRangeValue | 2.57 ns | 2.41 ns | −6% |
The ±Infinity improvement also holds for the 50 and 1000 bound counts (e.g. LookupNegativeInfinity at 1000/MixedSigned: 3.29 ns → 0.18 ns; LookupPositiveInfinity: 4.13 ns → 0.48 ns) — these baseline cells were not dead-code-eliminated, so they're trustworthy.
Conclusion
A clear performance win with no allocation cost: infinity/NaN edge-case lookups are now effectively free, and the common finite-value lookups are ~50% faster once the bound array is large enough to matter. No genuine regressions were found — the only "slower" numbers come from baseline cells the JIT had optimized away, where the candidate is simply being measured doing real work.
Note: the .NET Framework 4.6.2 rows show the same trend and the same dead-code-elimination artifacts at BoundCount 50/1000.
Fixes #3428
Changes
Apply recommendations from #3428:
Written primarily by Copilot through an iterative approach.
Final benchmark results are shown below: #7165 (comment)
I haven't benchmarked the ARM64 SIMD branch as I don't have the hardware for that locally.
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changesChanges in public API reviewed (if applicable)